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INTRODUCTION 


In this paper we examine the uses of teaching performance assessments (TPAs) as 
resources for learning, program evaluation, and improvement in teacher education. 
In undertaking our review, we take note of our positionality as practitioners within the 
field of educator preparation. Our professional work as educators is carried out largely 
at the program level, inside the practical dilemmas of policy and practice that emerge 
as we attempt to use the TPA tools described in this paper. Our review is intended to 
be of value to policymakers, faculty, and academic leaders as they navigate the chal- 
lenges and opportunities of using TPAs as resources for improving the work of teacher 
preparation. 

Because of this, we conceptualize TPAs not simply as tests, but as historically situ- 
ated forms of activity wherein teacher educators, preservice teachers, and policymakers 
navigate a complex interplay of policy mandates, conceptual and material resources, 
organizational conditions, and motivations related to diverse and often contested goals 
for assessment, accountability, and program improvement. This view is consonant with 
contemporary conceptualizations of assessment “validity” in which the consequences 
of using an assessment tool are evaluated in concert with its psychometric properties 
(Kane, 2013; Messick, 1994). Moss (2013) has elaborated on the idea of consequential 
validity to include attention to how assessment data are used in practice, arguing that 
“the focus of validity questions will need to shift again to the broader learning or 
organizational environment and the extent to which it is sufficiently well resourced to 
support an evidence-based professional practice that enhances student learning” (p. 96). 
In this paper, we are particularly interested in clarifying the conditions under which 
TPAs function (or do not function) as useful tools for decision-making related to issues 
of teacher licensure, program evaluation, and program improvement. 

We begin by outlining our conceptual framing and related research questions about 
the uses of TPAs as resources for program evaluation and improvement. We describe 
some of the defining features and affordances of TPAs and consider how these measures 
engage traditional questions and concerns about the validity and generalizability of 
the findings they yield. Our discussion then proceeds with a brief history of teaching 
portfolios, locally developed performance assessments, and standardized measures 
of teaching performance, noting how the purposes of these assessments have been 
shaped by the increasing emphasis on external accountability in public policy over 
recent decades. Using this historical account as context, we focus the main body of our 
review on the contemporary research literature related to the uses of standardized TPAs 
in preservice teacher education. We conclude the paper with a set of recommendations 
for policy, practice, and research aimed at evaluating and improving the uses of TPAs 
as resources for the improvement of teacher education. 


CONCEPTUAL FRAMING AND RESEARCH QUESTIONS 


In this section we identify a set of working assumptions about the role of contextual 
factors in shaping how TPAs are used as tools for measuring and improving preservice 
teaching quality. First, as noted above, we assume that TPAs, as a specific form of pro- 
fessional activity, are embedded in larger historical and policy contexts that shape how 
they are understood and enacted. We assume that these policy contexts are themselves 
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affected by interest group advocacy, including the actions of professional organizations. 
The types and uses of assessment in teacher education have changed dramatically over 
the past two decades as the public policy zeitgeist has shifted toward accountability 
goals (Bales, 2006; Cochran-Smith et al., 2018). 

We assume that the effects of macro-level changes in public perception and public 
policy related to teacher quality and teacher education are mediated by local conditions 
affecting how assessment data are used in practice (Davis & Peck, 2020; Spillane & 
Meile, 2007). We attend particularly to the ways in which the values, beliefs, and per- 
ceptions of the people involved in program-level work shape their approaches to TPA 
implementation. We also consider what may be gleaned from the literature about the 
different ways in which TPAs may be used as tools for decision-making. Our interests 
include the variety of ways in which TPAs are scored, and how the data are disaggre- 
gated and represented to support access, interpretation, and action by program faculty 
and staff. Finally, we review research related to organizational policies, practices, and 
routines that afford or constrain opportunities to use TPAs as a resource for support- 
ing candidate learning and program improvement, including the role that leadership 
plays in orchestrating the local negotiation of practice related to TPA implementation. 
We use this conceptual framing to engage the following questions: 


1. What are the social and political contexts in which TPAs have emerged as 
prominent and often controversial tools for policy and practice in teacher 
education? 

2. What are the characteristic features and related affordances of TPAs as tools for 
decision-making, including decisions related to candidate licensure, support, 
and program improvement? 

3. How is TPA implementation mediated by the values, beliefs, and motives of the 
faculty, staff, and academic leaders in a teacher education program, the ways 
that the assessment tools are used, and the organizational policies and practices 
of the institutions involved? 


TEACHING PERFORMANCE ASSESSMENTS: WHAT 
ARE THEY, AND WHAT ARE THEY USED FOR? 


The term “performance assessment” refers to approaches to description and evalu- 
ation of human competence and skill based on evidence collected in the contexts of an 
individual’s participation in “authentic” activities of practice. A contemporary defini- 
tion of the term offered by the Educational Testing Service refers to a performance 
assessment as “a test in which the test taker actually demonstrates the skills the test 
is intended to measure by doing real-world tasks that require those skills, rather than 
by answering questions asking how to do them” (Educational Testing Service, 2020). 

This is by no means a new idea. Proposals for new forms of educational assessment 
based on performing “real-world” tasks gained momentum in the early 1990s in the 
context of widespread critiques of standardized achievement testing (Shepard, 1991) 
including tests used to evaluate the qualifications of prospective teachers (Darling- 
Hammond, 1986; Haertel, 1991). Particularly at issue in many of these critiques were 
questions about the relevance of traditional teacher test data to the tasks of improv- 


ing preservice teachers’ instruction in classrooms. Indeed, Messick (1994) described 
improvement of learning outcomes as one of the primary goals of performance assess- 
ment: “Exposure to authentic assessment is expected to provide the student with a 
meaningful educational experience that facilitates learning and skill development 
as well as deeper understanding of the requirements and standards for good perfor- 
mance” (p. 17). In the case of teacher education, the hope was that performance assess- 
ments would not only provide a means of evaluating preservice teachers but would 
also support their development of classroom practice. 

The term “performance assessment” can refer to a variety of test designs used to 
sample or represent real-world activity, including simulations, projects, essays, demon- 
strations, and work products (Davey et al., 2015). In the context of preservice teacher 
education, however, the term “teaching performance assessment” most often refers to 
work samples or “portfolios” that integrate the collection, analysis, and evaluation of 
artifacts and related products derived from actual classroom teaching practice. Port- 
folios may consist of relatively informal collections of artifacts gathered over several 
months of practicum work in the classroom, or they may be highly standardized pro- 
cesses in which specific artifacts of teaching are required to be collected and analyzed 
by the candidate over the course of several lessons. These typically include data from 
(P-12) student classroom assessments, lesson plans, video records of teaching, and 
samples of student work, accompanied by analytic and reflective commentaries. 

Data from TPAs are used within preservice teacher education to evaluate individual 
candidate teaching practice, both as formative measures of candidate progress and as 
summative measures related to decisions about licensure. In addition, TPA data may 
be aggregated across candidates for the purpose of program evaluation, both with 
respect to evaluation of changes in local program learning outcomes over time and to 
enable comparisons across programs. Both program-level TPA data and artifacts from 
individual assessment portfolios can be used as resources for learning and program 
improvement. 


Design and Evaluation of TPAs 


TPAs differ from traditional forced-choice measurement methodologies in ways 
that are related to their focus on documentation and evaluation of performance in 
“real-world” contexts. For example, whereas traditional assessments of teacher knowl- 
edge attempt to equalize opportunities to demonstrate competence by standardizing 
the conditions under which tests are administered, TPAs typically rely on data from 
practice in actual classrooms, which introduces significant variation in the conditions 
under which these assessments are conducted. These include differences in students, 
cooperating teacher support, and school curriculum policies and practices. The focus 
of assessment is on the process of teaching, and equity in opportunity to perform is 
based on a standardized collection of planning documents, observational records of 
teaching, and samples of P-12 student work, with related analytic commentaries. In 
Table 1, below, we identify some of the design parameters and related evaluation ques- 
tions that represent important considerations for evaluating TPAs. (For more detailed 
explication and discussion of design principles and evaluation criteria for performance 
assessments, see Davey et al. [2015], Khattri et al. [1998], and Moss [2013].) 
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TABLE 1 Design and Evaluation Considerations for Teaching Performance 


Assessments (TPAs) 


Design Parameter 


Key Question 


Evaluation Focus 


Resources 


Content Validity 


Generalizability 


Predictive Validity 


To what extent does the 
TPA measure important 
aspects of teaching 
practice? 


To what extent are TPA 
scores consistent across 
raters? 


To what extent are TPA 
scores consistent across 
variation in students, 
curriculum domains, or 
school contexts? 


To what extent are TPA 
scores correlated with 
measures of socially 
important educational 


Alignment of TPA with 


contemporary research on 
teacher effectiveness and/ 


or professional standards 
of teaching practice 

(e.g., Interstate Teaching 

Assessment and Support 
Consortium) 


Published studies of 
scorer training and inter- 
rater agreement 


Studies of effects of 
context variables on TPA 
scores 


Studies of the relationship 


of TPA scores and P-12 
student achievement and 
teacher employment and 


Sato, 2014 


Bastian et al., 2016; 
Pecheone & Chung, 2006 


Adie & Wyatt-Smith, 2020; 
Bastian et al., 2020 


Bastian et al., 2016; 
Goldhaber et al., 2017 


outcomes? retention 


Goldhaber et al., 2017; 
Ledwell & Oyler, 2016 


Studies of the uses of 
TPAs in decision-making 


To what extent do 

TPAs screen ineffective 
teachers from entering the 
workforce? 


Consequential Validity 


To what extent do 
TPAs lead to candidate 
learning? 


To what extent do TPA 
data lead to program 
improvement? 


Studies of preservice 
teacher learning 


Chung, 2008; Lin, 2015 


Studies of program De Voto et al., 2020; Lys et 
improvement process and_al., 2016; Peck et al., 2010 
outcomes 


The use of TPAs as resources for evaluation and improvement of instructional prac- 
tice has been part of a larger vision and agenda for the professionalization of teaching, 
articulated most directly in the work of the National Board for Professional Teaching 
Standards (NBPTS; National Board for Professional Teaching Standards, 2021b). Within 
this view, a defining feature of a profession is its control and authority over standards of 
practice, membership, and evaluation within the profession itself, rather than external 
bureaucracies (Cochran-Smith et al., 2018; Darling-Hammond, 1986; Shulman, 1987a). 
The implicit theory of professionalization involves establishing a relationship between 
consensus on professionally defined standards of practice, localized self-study and 
peer-mediated assessment related to those standards, and commitment to learning in 
and from the contexts of actual practice as a resource for improvement (Darling-Ham- 
mond & Snyder, 2000). Within this framework, classroom-based performance assess- 
ments of teaching are envisioned at multiple points along a continuum of professional 
development, beginning with preservice teacher preparation, and continuing through 


certification at the residency and advanced levels of professional practice (National 
Board for Professional Teaching Standards, 2021a). 

While recognizing the potential of performance assessments as resources for 
improvement of teaching practice, Linn et al. (1991) also identified some of the ten- 
sions that accompany the assessment of complex, “real-world” activity: 


If great weight is attached to the traditional criteria of efficiency, reliability, and com- 
parability of assessments from year to year, the more complex and time-consuming 
performance-based measures will compare unfavorably with traditional standardized 
tests. (Linn et al., 1991, p. 16) 


Tensions between accountability-driven assessment goals, which press for stan- 
dardized control of assessment conditions to facilitate comparison of scores, and 
improvement-driven goals, which encourage more detailed description and analysis 
of performance in context, have proven to be thematic to the history of the develop- 
ment and implementation of TPAs over the past decades. Indeed, the design of TPAs 
has shifted over time as the field has responded to changes in policy priorities related 
to these tensions. 


The Teaching Portfolio: Early Research and Development Work 


Early development of the teaching portfolio began with a focus on improving the 
practice of classroom teachers through systematic collection and analysis of artifacts 
from a teacher’s actual work in the classroom and gained national prominence through 
the work undertaken by Lee Shulman et al. (1987a) in the Teaching Assessment Project 
at Stanford University in the late 1980s. Working with this group, Wolf (1991) defined a 
teaching portfolio as more than a “container” for records of a teacher’s practice, noting 
that “a portfolio also embodies an attitude that assessment is dynamic [emphasis added] and 
that the richest portrayals of teacher (and student) performance are based on multiple 
sources of evidence collected over time in authentic settings” (p. 130). 

In one of the early empirical assessments of the teaching portfolio as a resource for 
learning and improvement of practice, Athanases (1994) reported on a yearlong process 
in which 24 elementary classroom teachers collected artifacts of their literacy-related 
work with their students. Almost all of the teachers reported improvements in their 
teaching as a result of their portfolio work, particularly with respect to their practices 
related to student assessment. Despite a number of methodological limitations noted 
by Athanases (1994), this report provided one of the first well-documented evaluations 
of the potential of “authentic” performance-based assessments of teaching as a resource 
for evaluation and improvement of teaching practice. 

Subsequent studies of the use of teaching portfolios in teacher education programs 
suggested that many of the benefits of the process reported for experienced teachers 
by Athanases were also evident with preservice teachers (Borko et al., 1997; Darling- 
Hammond & Snyder, 2000; Zeichner & Wray, 2001). For example, Borko et al. (1997) 
studied the experiences of 21 teacher candidates who completed teaching portfolios 
during their student teaching practicum. Both written comments and interviews with 
the candidates identified opportunities for reflection and improvement of teaching as 


the most frequently mentioned benefit of the portfolio process. Subsequent research in 
additional teacher education programs affirmed many of the findings of Borko et al. 
(1997), including the affordances of the teaching portfolio as an opportunity for can- 
didate learning and improvement of practice, the importance of faculty and program 
support for the portfolio assessment process, and concerns about the time-intensive 
nature of the portfolio work (Anderson & DeMeulle, 1998; Snyder et al., 1998). 

Two important themes are evident in these early studies of the teaching portfolio 
process in preservice teacher education. First, it is clear from multiple studies that the 
process of constructing and reflecting on a teaching portfolio represents a rich oppor- 
tunity for learning and improvement of practice for preservice teachers (Anderson & 
DeMeulle, 1998; Borko et al., 1997; Snyder et al., 1998). Anderson and DeMuelle (1998) 
also noted the affordances of portfolio assessments as tools for faculty learning and 
program improvement but noted these were taken up less frequently in the programs 
they studied. 

Second, as anticipated by Linn et al. (1991), almost all of the investigations of portfo- 
lio implementation reported significant tensions between the multiple uses of teaching 
portfolios as resources for candidate learning, for licensure decisions, and for program 
improvement (Borko et al., 1997; Delandshere & Arens, 2003; Snyder et al., 1998; Zeich- 
ner & Wray, 2001). These tensions have become more salient as public concerns about 
teacher quality and related policy pressures for teacher education program account- 
ability and improvement have intensified over the past two decades (Cochran-Smith 
et al., 2018; Delandshere & Petrosky, 2010). 


Changing Policy Context: TPAs and the Press for Accountability 


Concerns about the quality and effectiveness of teacher education programs have 
existed for many decades (Conant, 1963; Goodlad, 1990; Levine, 2006; Sarason, 1993). 
However, these concerns have intensified as public confidence in the effectiveness 
and efficiency of government institutions has declined broadly. Policymakers have 
responded to these changes with mandates for increased external accountability for 
public institutions, increased investment in private-sector actors as alternatives to 
government programs, and increased reliance on market-based theories of program 
improvement (Boreham, 2004; Trohler et al., 2014). 

Congruent with the larger shift in the policy landscape, public policy interventions 
in the field of teacher education have intensified dramatically over the past two decades 
(Cochran-Smith et al., 2013). The new policies reflect a significant shift away from reli- 
ance on local professional judgment in decisions about teacher licensure toward policies 
emphasizing standardized measurement and external accountability (Cochran-Smith 
et al., 2018; Crowe, 2011). Many of the tensions identified in the early research and 
development work on teaching portfolios have become more salient as these types of 
“authentic” assessments have been appropriated within contemporary policy initiatives 
focused on accountability. For example, while early examples of portfolio assessment 
in preservice teacher education were included as one of many sources of data in local 
decisions about teacher licensure, many contemporary policies require passing scores 
on a TPA as a condition for licensure. So, while the tensions between the uses of per- 
formance assessments as resources for learning and their reliability and validity when 


used as tools for consequential decisions about licensure have been recognized since 
some of the earliest research and development work (Gellman, 1993; Linn et al., 1991; 
Messick, 1994), policy changes affecting how these tools are used have made these ten- 
sions more visible and more problematic (Gitomer et al., 2019). 

At the same time, the policy pressures for increased standardization and account- 
ability in teacher licensure have offered the field opportunities to advance the profes- 
sionalization of teaching by linking a national conversation aimed at building consensus 
around professional standards of teaching practice with the tools of authentic, practice- 
based assessment (e.g., NBPTS and the National Board Certification Process), and by 
introducing these tools into state and national teacher education policy initiatives 
(Darling-Hammond, 2010). A crucial design principle underlying this work is that 
both standards of practice and TPAs based on those standards should be developed by 
teachers (and teacher educators) themselves (Haertel, 1991). Indeed, the locus of power 
and control over the design of assessment tools used to evaluate candidate teaching 
performance has been one of the most significant focal points of controversy among 
teacher educators. 


Locally Developed, Standardized TPAs 


One strategic response to the tensions between increased external public policy 
pressures for accountability and local program values regarding authenticity, auton- 
omy, and agency in assessment of preservice teacher quality is evident in program-level 
efforts to develop more standardized tools for assessing teaching performance. An early 
example of this approach was the Teacher Work Sample (TWS) methodology devel- 
oped at Western Oregon University (Schalock, 1998; Schalock et al., 1997). The TWS 
assessment process provided for a standardized collection and evaluation of artifacts 
of preservice teachers’ practice collected over a 3-5 week period, including a descrip- 
tion of the classroom context, lesson plans, samples of student work, and plans for 
revision of instruction based on analysis of student learning data. Follow-up surveys 
and interviews with teacher candidates and faculty using the TWS indicated that the 
performance assessment process was a valuable resource for the improvement of teach- 
ing practice for both teacher candidates and faculty (Reusser et al., 2007). At the same 
time, data from the TWS provided an important evidentiary warrant for both state 
program certification and national accreditation (Schalock, 1998). 

The TWS methodology was subsequently taken up by the Renaissance Group, a 
consortium of university-based teacher education programs, and adapted in ways that 
were congruent with differences in state policy contexts in which member institutions 
were situated. In one example of this strategy, faculty and academic leaders at Califor- 
nia State University, Fresno (Fresno State), used the TWS as a foundation for develop- 
ing the Fresno Assessment of Student Teachers (FAST) as a strategic response to new 
California state mandates for the use of a standardized performance assessment as 
part of the program certification process (Torgerson et al., 2009). In reports on both the 
Oregon and Fresno State work, authors noted the importance of local ownership of the 
assessment process as integral to faculty support and engagement with the challenging 
work of implementing a TPA. Torgerson et al. (2009) elaborated on the importance of 
this feature of the work at Fresno State: 


In contrast to other university programs that had to select a performance assessment 
and secure faculty support and buy-in, the Fresno State faculty effort, expertise, and 
investment in the creation of FAST made its adoption a natural part of a multi-year 
process to improve programs and assessment. (p. 80) 


The complex interplay between state policy mandates, the affordances and con- 
straints of standardized TPAs as tools for improving teacher education programs, and 
the dynamics of how these tools are implemented at the program level is the focus of 
the remainder of our review. 


IMPLEMENTING STANDARDIZED TPAs AT SCALE: NEGOTIATING 
THE TENSIONS BETWEEN ACCOUNTABILITY AND IMPROVEMENT 


In undertaking our review and analysis of contemporary uses of standardized TPAs 
at the state and national level, we are interested specifically in how the implementation 
of the TPA process is mediated not simply by features of the tools themselves, but also 
by contextual variables related to state policy, faculty perceptions and motivation, and 
organizational supports affecting how they are implemented in practice (Cohen et al., 
2020; Moss, 2013; Peck & Davis, 2019). We assume that standardized TPAs, particularly 
when used across programs in the context of state policy mandates, represent a negoti- 
ated response to the tensions between external pressures for accountability and local 
priorities related to authenticity and usefulness of the assessment. We do not imagine 
these tensions disappearing in the foreseeable future. We also understand these ten- 
sions as a potential impetus for learning, organizational change, and improvement 
(Engestrém, 1987; Foot & Groleau, 2011). Consequently, our analysis is aimed at iden- 
tifying what can be learned from the extant research literature about these tools and 
how they might be used in ways that afford equitable evaluation of teacher candidates 
as well as useful resources for candidate learning, program evaluation, and program 
improvement. 


What Are the Features and Affordances of Standardized TPAs? 


Standardized TPAs are designed to gather and evaluate data from actual teaching 
practice, including not only observational records of interactions between teachers and 
students but also artifacts of the kinds of cognitive processes that are involved in plan- 
ning, enacting, and improving teaching. Based on their alignment with widely accepted 
national standards for teaching practice (NBPTS; Interstate Teaching Assessment and 
Support Consortium [InTASC]), standardized TPAs generally require descriptions of 
classroom context and student demographics, examples of lesson and unit planning, 
records of instruction (either direct observation or via video recording), samples of 
student work with accompanying analysis of student learning outcomes, and follow-up 
instructional plans based on student learning data. Standardized TPAs also differ from 
one another in significant ways, including the length of time during which artifacts of 
the teaching process are collected and analyzed, the specific prompts and evaluation 
rubrics used in the assessment process, and procedures for calibration of scorers. The 
practice-based and integrated view of teaching practice that emerges through this kind 


of holistic assessment allows for a richer and more contextualized view of teaching 
competence than either standardized tests of content and pedagogical knowledge, or 
coursework assignments and projects, which focus on the specific aspects of teaching 
addressed in individual courses (Sloan, 2013; Snyder et al., 1998). 

The holistic, integrated, and contextual view of teaching that TPAs can provide offers 
unique opportunities for evaluation and improvement of practice for both candidates 
and faculty. For candidates, some of the most significant challenges of learning to teach 
involve integrating what they have learned from various courses and fieldwork experi- 
ences. For example, the actual practice of teaching requires candidates to integrate and 
align what they have learned about culturally responsive teaching (Gay, 2001), equitable 
classroom management (Milner & Tenore, 2010), and the subject-specific pedagogies 
of academic disciplines such as literacy or mathematics (Shulman, 1987b). TPAs can 
provide concrete evidence of how candidates are managing these challenges in their 
classroom practice, thus creating opportunities for feedback, evaluation, and learning 
that may otherwise be lost in the silos of instruction and evaluation that are challenges 
to building and sustaining coherence in most teacher education programs (Grossman 
et al., 2008; Richmond et al., 2019). 

Practice-based, integrative descriptions of teaching can be particularly valuable in 
the context of collective processes of program evaluation and improvement undertaken 
by faculty, field supervisors, mentor teachers, and program administrators, each of 
whom may otherwise have only a relatively narrow understanding of the program as a 
whole (Sloan, 2013; Snyder et al., 1998). These kinds of assessment data can provide con- 
crete information about what candidates are taking up (or not) from their coursework 
and fieldwork experiences. This integrated, field-based view of candidate teaching 
practice can be particularly useful in identifying possible sources of program outcome 
problems (or virtues) that show up in broader, more decontextualized program evalu- 
ations such as graduate or employer satisfaction surveys or value-added assessments 
of P-12 student achievement in program graduates’ classrooms (Davis & Peck, 2020). 

A standardized TPA also affords an opportunity for faculty, field supervisors, and 
mentor teachers to develop a common and concrete language of practice, which in 
turn supports the kinds of communication and collaboration that are critical resources 
for professional learning and programmatic change (Bloxham et al., 2016; Nicolini et 
al., 2012; Nolen et al., 2011). The development of a common language of practice takes 
place as faculty negotiate a shared understanding of terms used to evaluate the practical 
activity of teaching as required for consistent scoring of teaching portfolios (Sloan, 2013; 
Whittaker & Nelson, 2013). It is important to note that having a common language does 
not require or imply agreement in decisions about practice, nor does it imply adher- 
ence to static philosophical views about teaching (Buchmann & Floden, 1992). Rather, 
the idea is that a common language of practice allows program members to share a 
relatively clear understanding of the questions and alternatives under consideration 
in decisions about practice (Hall & Horn, 2012). Indeed, without a common language 
as a tool for understanding and negotiating differences in values and practices across 
program participants, it is hard to imagine how programs may achieve or sustain intel- 
lectual or practical coherence, or even recognize instances of incoherence across courses 
and across coursework and fieldwork settings. 
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What Questions Are Raised by the Uses of Standardized TPAs? 


Early and enthusiastic appraisals of the potential of performance assessments as a 
resource for improvement of instruction also consistently recognized the inherent ten- 
sions between highly contextualized descriptions of practice such as those offered by 
TPAs and the challenges of making equitable comparisons of practice across individuals 
and settings (Haertel, 1991; Linn et al., 1991; Messick, 1994). As we noted earlier, the 
practical implications of these tensions have become more significant as the uses of 
TPAs have expanded in the context of intensifying accountability policy. In this section 
we identify some of the recurring questions that have been raised about using TPAs for 
the purposes of comparison, particularly in the context of high stakes decisions about 
teacher licensure (Adie & Wyatt-Smith, 2020). In summarizing some of the research 
related to these concerns, we focus particularly on three well known examples of stan- 
dardized TPAs: The TWS (Denner et al., 2004; McConney et al., 1998), the Performance 
Assessment for California Teachers (Pecheone & Chung, 2006), and the edTPA (Stanford 
Center for Assessment, Learning, and Equity, 2015). Our interests are not in critiquing 
these specific measures, but in clarifying some of the strengths and limitations they 
share. We organize our discussion in terms of some of the practical questions that 
policymakers and practitioners must consider in using these tools, and in interpreting 
the data they produce with respect to both accountability and improvement purposes. 


Are TPAs Valid Measures of Teaching Practice? 


TPAs are typically aligned with state and national standards for teaching practice, 
including those developed by InTASC and NBPTS. These standards of practice repre- 
sent a decades-long effort by teachers and teacher educators to establish some national 
consensus on standards and expectations for practice to be used to guide the prepara- 
tion and evaluation of teachers (Shulman & Sykes, 1986). A considerable body of evi- 
dence suggests that TPAs based on these standards do measure important aspects of 
teaching practice (Bastian et al., 2016; Campbell et al., 2016; Cooner et al., 2011; Denner 
et al., 2004; Pecheone & Chung, 2006; Sato, 2014; Stanford Center for Assessment, Learn- 
ing, and Equity, 2015; Stewart et al., 2015). In addition, several studies have found that 
scores on these kinds of measures may predict the effectiveness of novice teachers after 
licensure, although these findings are not consistent across studies (Bastian et al., 2016; 
Chen et al., 2021; Darling-Hammond et al., 2013; Goldhaber et al., 2017; Stanford Center 
for Assessment, Learning, and Equity, 2015). 

Some standardized TPAs have been criticized for not attending more substantively 
to issues of teaching practice related to social justice and equity (National Associa- 
tion for Multicultural Education, 2014; Stillman et al., 2013; Tuck & Gorlewski, 2016). 
Particularly at issue are relational dimensions of teaching that are difficult to measure 
(Behizadeh & Neely, 2019; Choppin & Meuwissen, 2017). For instance, Choppin and 
Meuwissen (2017) reported that candidates felt that the videos produced for the ed TPA 
did not capture the relationship building that happened between teachers and students 
in informal interactions outside of class time. Others have argued that performance on 
TPAs is conflated with skills that are not essential for teaching, such as candidate writ- 
ing skills (Behizadeh & Neely, 2019), technology skills (Choppin & Meuwissen, 2017), 
and “test-wiseness” (Clark-Gareca, 2015). These are important concerns that merit more 
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focused and rigorous research than has been reported to date. One example of related 
work under way is the recent revision of the CalTPA (https: //www.csusm.edu/soe/ 
currentstudents /caltpa_overview.pdf), with the accompanying direct investigation and 
evaluation of its features using criteria drawn from the literature on equity and racial 
justice in teacher education (Escalante et al., 2021). 

Some recent research suggests that candidate performance on a TPA may also 
be affected by the quality of program preparation and support candidates receive in 
preparing for the assessment (Adie & Wyatt-Smith, 2020; Kim & Sato, 2019), including 
the characteristics of mentor teachers and student teaching placement schools (Bastian 
et al., 2020). Of course, programs should affect teaching practice—that is their mission, 
and the rationale for investing in program improvement. However, when achieving a 
certain TPA score is used as a requirement for licensure, there is a concern about the 
validity of comparative judgments about teacher quality in the context of differences in 
opportunities to learn and perform. From one perspective, the primary concern should 
be to protect the educational welfare of the P-12 student by ensuring that prospective 
teachers are competent (i.e., that they can pass rigorous licensure requirements such 
as a TPA). However, from a teacher candidate’s perspective, an additional concern is 
that decisions about licensure be made equitably, with appropriate consideration of the 
ways context—including both coursework and practicum work—may affect teaching 
performance. For example, Bastian et al. (2020) found that student teaching placement 
characteristics had a statistically significant impact on edTPA scores, a finding congru- 
ent with other research on the impact of placement characteristics on the learning and 
effectiveness of novice teachers (Goldhaber et al., 2017; Ronfeldt, 2015). While these 
studies raise questions about the ways that TPA scores—and indeed other measures of 
candidate learning and performance—should be interpreted in the context of significant 
differences in opportunities to learn in both university and field-based aspects of the 
teacher preparation program, we note that these tensions are by no means restricted to 
evaluations of teaching. They also exist in other fields of professional education such as 
law, medicine, and architecture, in which they are consistently resolved in favor of pro- 
tecting the standards of the profession and the interests of the populations to be served. 

Practical strategies for improving the validity of judgments about individual teach- 
ing in ways that protect the interests of both candidates and P-12 students include 
collecting broader samples of teaching practice, both across time and context, as these 
are likely to yield more robust findings about the quality of preservice teacher practice. 
For example, sampling preservice teaching performance in multiple field settings rep- 
resenting variation in student populations, mentor teacher characteristics, and school 
curriculum policy allows for some direct evaluation of the impacts of these contextual 
factors on the quality of candidate teaching practice. The costs of collecting and evaluat- 
ing broader samples of teaching performance may be ameliorated by embedding these 
as formative assessments at multiple points within a program of teacher preparation 
(e.g., Alloway & Lesh, 2019). 

Other concerns about the validity of TPAs may be grounded in differences in values 
and beliefs about teaching, including those that may exist between university-based 
teacher educators and local practitioners and community members (Zeichner et al., 
2015). The recent and dramatic increase in public attention and engagement with issues 
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of racial equity and social justice also reminds us that educational values and priorities 
change over time. This suggests that questions about the validity of TPAs are likely, and 
appropriately, to be a focus of ongoing dialogue and deliberation between researchers, 
practitioners, teacher educators, and the communities they serve.” 


Do Different Evaluators Rate TPAs Consistently? 


The challenges of achieving consistent inter-rater agreement in scoring and evalua- 
tion of TPAs have been noted since the introduction of these assessment methodologies 
(Haertel, 1991; Messick, 1994). Psychometric studies carried out with several TPA instru- 
ments suggest that these challenges persist across differences in specific work sample 
content and protocols for rater training and scoring. For example, while satisfactory 
inter-rater agreement was achieved in scoring Renaissance TWSs when the assessments 
were evaluated by panels of three raters (Denner et al., 2009), a subsequent study of 
rater discourse during these evaluation processes showed that, while agreement was 
often achieved, it was not necessarily based on shared understanding and evaluation 
of the candidate’s work, but on locally negotiated conventions for achieving agreement 
(e.g., “splitting the difference”) (Bullough, 2010). 

Reliability studies for both Performance Assessment for California Teachers (PACT) 
and edTPA have similarly suggested that while substantial inter-rater agreement can 
be achieved for these measures with strong training and audits that conduct back- 
reads (Pecheone & Chung, 2006), the procedures used to calculate agreement for these 
measures may mask meaningful differences in rater evaluations (Gitomer et al., 2019; 
Peterson & Lyness, 2015; Porter & Jelinek, 2011). For these reasons it is important to 
plan for multiple scorers near the cut point that determines passage of an assessment 
(e.g., Whittaker et al., 2018). Concerns about inter-rater agreement in the scoring of 
TPAs are not restricted to these three measures. Detailed psychometric studies of other 
TPAs (e.g., Riggs et al., 2009) reflect similar challenges in achieving consistency in both 
calibration and scoring procedures. In other cases, relevant psychometric data are not 
included in published descriptions of the instrument (e.g., Meyer et al., 2018). 

The consistency with which difficulties in achieving agreement among scores across 
raters have been reported suggests the importance of careful consideration about the 
kinds of decisions for which these tools are used, as well as additional care in rescoring 
assessments near the cut score for passage. These findings additionally underscore the 
importance of rigorous scorer training, as well as ongoing re-calibration and assessment 
of inter-rater agreement. Finally, it is important that inter-rater agreement data be used 
to continuously evaluate the scoring process and to identify both the needs for general 
changes in scorer training, as well as interventions to remediate the performance of 
individual scorers (e.g., Meyer et al., 2018; Stanford Center for Assessment, Learning, 
and Equity, 2015). 


Equity and TPAs 


Concerns about racial equity related to educational assessment in higher education 
are pervasive and compelling (Stewart & Haynes, 2015). Petchauer et al. (2018) trace 


?The CalTPA work cited above is an example of what this might look like. 
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the history of standardized assessment in teacher education, noting how racial group 
differences in performance on teacher tests have been an ongoing focus of concern, 
particularly with respect to the gatekeeping functions of these tests (Gitomer et al., 2011; 
Goldhaber & Hansen, 2010). Although the focus of these concerns historically had been 
on multiple choice tests of academic skills and subject-matter knowledge, where racial 
disparities in performance have been more pronounced, research on standardized TPAs 
has shown that some historically marginalized cultural and/or racial groups may not 
perform as well as their White counterparts on these assessments. For example, Gold- 
haber et al. reported that, although failure rates were relatively low for both groups, 
Hispanic teacher candidates in Washington State were three times more likely to fail the 
edTPA than White candidates (Goldhaber et al., 2017). Petchauer et al. (2018) note that 
studies comparing the performance of candidates from different racial groups on PACT 
found lower test scores for Black candidates (Stanford Center for Assessment, Learning, 
and Equity, 2015), but did not replicate the findings of Goldhaber et al. (2017) regarding 
Hispanic candidates. Taken together, these results suggest that equity concerns related 
to standardized TPAs warrant careful and conservative approaches to interpretation 
of test score data, particularly in the context of high stakes decisions about licensure. 
However, we note Haertel’s (1991) early questions about the sources of these risks: 


The fact that, on average, some racial/ethnic, gender, or other identifiable groups 
outperform others on an examination does not in itself imply that the examination is 
biased, but should nonetheless trigger a careful scrutiny of the test itself, the conditions 
of its use, and the prior preparation of examinees to ensure, to the extent humanly pos- 
sible, that test bias is not present. (p. 24) 


While a facile response to equity concerns about TPAs might be to eliminate the 
tests, it is entirely possible that racial and/or cultural differences in TPA pass rates 
signal problems with the quality and effectiveness of program support and preparation 
of candidates to succeed on these assessments. For example, in one survey study of the 
experiences of candidates of color with the edTPA (Williams et al., 2019), candidates of 
color perceived themselves to be more ready for the assessment than White candidates 
perceived themselves to be, and yet they failed more often than White candidates. 
While the study does not provide enough detail to allow for unambiguous interpreta- 
tion of the interactions that took place between these candidates and their instructors, 
the authors suggest the possibility that the quality of faculty feedback to candidates 
may have contributed to poor alignment of candidates of color’s self-assessment of 
their readiness with their subsequent performance on the TPA. Eliminating the TPA as 
a measure of candidate readiness to teach may erase a signal of inequitable practices 
in teacher preparation without engaging their sources. 

Several recent studies have documented significant differences in both the quality 
and quantity of faculty and program-level supports for candidates as they prepare for 
high stakes TPAs (Cohen et al., 2020; De Voto et al., 2020; Ratner & Kolman, 2016). The 
extent to which these differences affect TPA performance is still unclear. In the Williams 
et al. (2019) study, candidates’ edTPA scores were positively related to the number of 
support activities that they attended. The Cohen et al. (2020), Denner et al. (2009), and 
Ledwell and Oyler (2016) reports all show that significant variation in these kinds of 
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supports may exist even across programs within a single institution. As mentioned 
above, findings from Bastian et al. (2020) suggest that at least one program variable, 
the quality of mentor teachers selected for supervision of practicum experiences, had 
a substantial impact on TPA performance. Taken collectively, these studies suggest the 
quality of program preparation and support, including the selection and support of 
mentor teachers, is indeed likely to affect TPA scores. The findings amplify concerns 
about equity related to the uses of standardized TPAs as high stakes measures of teach- 
ing quality, but also suggest that the locus and burden of accountability policies may 
be disproportionately placed on candidates rather than programs. 


TPAs and Decisions About Licensure 


The considerations involved in using TPAs as high stakes measures of readiness to 
teach are particularly acute in cases where candidate scores are close to the established 
“cut scores” for licensure (Goldhaber et al., 2017). Following advice from assessment 
researchers and test developers, augmentative procedures such as double and triple 
reading of portfolios scoring at or near cut scores for licensure have been developed and 
implemented for several TPAs, such as those used with National Board Certification, the 
ed TPA, and the TWS. We might reasonably anticipate that further research and devel- 
opment work could improve both the instruments and procedures for scoring them. 
However, Davey et al. (2015) also argue that “lower levels of score comparability may 
simply need to be accepted as the price for measuring otherwise inaccessible constructs 
or for measuring in more direct ways,” further noting that “less than fully comparable 
scores may also be perfectly acceptable for use in lower stakes circumstances” (p. 52). 

This conclusion is consistent with commentaries on performance assessment from 
other fields of human service that underscore both the importance and the psycho- 
metric challenges of considering issues of social context in evaluating professional 
practice (Govaerts et al., 2007; Truijens et al., 2019). Contemporary views of learning, 
however, suggest that professional practice is not simply embedded in social context 
but constructed in large measure from the social and material resources those contexts 
provide (Lave & Wenger, 1991). These considerations suggest that the process of evalu- 
ating preservice teacher practice, particularly for the purpose of making equitable and 
appropriate decisions about individual teacher licensure, is inescapably interpretive in 
nature. These decisions are strengthened not only by reliance on multiple sources of 
relevant evidence (including TPA data) but also by careful and deliberative dialogue 
among local teacher educators, including supervisors and mentor teachers whose work 
is situated in the P-12 classroom (Moss, 1992; Moss et al.,1998). 


HOW ARE THE AFFORDANCES OF TPAs AS RESOURCES 
FOR PROGRAM IMPROVEMENT AFFECTED BY THE 
WAYS IN WHICH THEY ARE IMPLEMENTED? 


More than two decades of research on TPAs, including both informal records of 
teaching collected in portfolios and more standardized samples of teaching practice 
used in large scale assessments of preservice teachers, demonstrate that TPAs can pro- 
vide meaningful opportunities for candidate and faculty learning and improvement of 
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practice (Borko et al., 1997; Bunch et al., 2009; Chung, 2008; Kohler et al., 2008; Lin, 2015). 
It is also clear that TPA data can be used effectively as a resource for program improve- 
ment (Cuthrell et al., 2019; Peck et al., 2010; Pointer Mace & Luebke, 2021; Reusser et 
al., 2007). However, it is equally clear that these outcomes are not always achieved and 
that the extent to which TPAs are actually used as resources for program evaluation and 
improvement depends very much on how they are implemented (De Voto et al., 2020). 
For instance, an implementation process that is rushed or lacks adequate resources can 
lead to increased faculty alienation and resistance to the TPA. In the following section, 
we describe some of the policy conditions and implementation processes that may affect 
how TPAs are utilized (or not) as resources for program improvement. 


State Policy Context 


A number of studies have investigated state policy conditions that may affect TPA 
implementation. These include the “stakes” of the assessment for candidates and pro- 
grams, the speed of implementation, and the opportunities for inter-organizational 
collaboration in the implementation process. 


High Stakes/Low Stakes 


One of the most common concerns about the use of standardized TPAs is that the 
use of these assessments as high stakes requirements for teacher licensure induces 
candidates to engage them as a compliance task, overlooking their affordances as an 
opportunity for learning and improvement of practice (Behizadeh & Neely, 2019; Dover, 
2018; Rennert-Ariev, 2008). This clearly happens in many cases. However, there is also 
evidence that some candidates may adopt a “compliance” stance even when completing 
low stakes TPAs, or even non-standardized teaching portfolios (Borko et al., 1997; Chye 
et al., 2019; Cronenberg et al., 2016; Darling-Hammond, 2010). There is also evidence 
that some preservice teachers engage TPAs with an inquiry and learning orientation, 
even when the stakes for licensure are high (Bacon & Blachman, 2017; Bunch et al., 
2009; Lin, 2015; Okhremtchouk et al., 2013). 

Programmatic responses to high stakes state policies requiring passing TPA scores 
for state licensure also vary dramatically (De Voto et al., 2020; Ledwell & Oyler, 2016). 
Some teacher educators interpret TPAs as a valuable opportunity for learning and 
improvement of both individual and collective practice and describe the value of these 
classroom-based data as a new and compelling source of insight and motivation for 
change. For example, one program director described this kind of proactive inquiry 
and improvement-oriented faculty responses to PACT data in this way: 


The persuasive piece was once they saw the student work. I mean, where a few people 
kind of went, “Whoa.” I teach this in my class and I’m not seeing it ... looking at the 
student work from the mock scoring there was that “ah hah” moment where [it was 
clear that] our candidates didn’t know much about (academic language).... [One pro- 
fessor] changed her entire series of assignments ... to better reflect what the holes in 
the data [showed]—also to incorporate more clearly the notion of academic language 
and mathematics. She literally rewrote everything related to that assignment because 
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it was so compelling to her, the data ...and seeing the student work. (cited in Darling- 
Hammond, 2006, p. 19) 


In other cases, faculty may engage state mandates related to accountability with 
clear intention to deflect what is interpreted as an unwarranted intrusion of state con- 
trol into matters that have historically been the domain of local faculty discretion and 
decision-making (Kornfeld et al., 2007). However, while faculty and academic leaders 
may share concerns about loss of local control over licensure decisions, some neverthe- 
less elect to organize local implementation of mandated TPAs with the goal of using 
the data as an opportunity for program-level learning and improvement (De Voto et 
al., 2020; Lys et al., 2014; Peck et al., 2010). 

These studies suggest that, while high stakes uses of standardized TPAs may be 
problematic for many reasons, their effects on both candidate and faculty learning and 
improvement of practice are inconsistent and are strongly mediated by factors such as 
local leadership action (we discuss this in more detail later in the paper). In view of 
this finding, we consider additional contextual factors related to state policy that may 
affect how TPAs are perceived and used. 


Pace/Speed of Implementation 


There have been notable differences in the pace of mandated TPA implementation 
by policymakers in different states. In two of the earliest statewide TPA implementa- 
tion efforts (in California and Washington State), state policymakers took a relatively 
gradual approach to implementation. These states required a passing score on the stan- 
dardized TPA as a high stakes requirement for teacher licensure only after a substantial 
period of planning and preparation during which the implementation of a TPA was a 
program requirement, but not used directly for licensure decisions (Chung, 2008; Peck 
et al., 2012). It is worth noting, however, that even with what in retrospect might be 
considered a relatively gradual schedule for full implementation of the new assessment 
policies, teacher educators in both California and Washington reported considerable 
stress and confusion as the implementation process unfolded at the state level (Lit & 
Lotan, 2013; Peck et al., 2012). We note that while stress may be an inevitable side effect 
of TPA implementation—the meanings of the process appear to be strongly mediated by 
how its purposes are understood by program members. Change efforts oriented around 
goals for inquiry and program improvement do not avoid stress but often do yield pro- 
grammatic changes that are highly valued by teacher educators (Darling-Hammond, 
2006; Peck et al., 2010; Whittaker & Nelson, 2013). In contrast, the work (and stress) 
required for TPA implementation when it is oriented toward compliance goals appears 
more likely to be experienced as alienating and “subtractive” in its effects on program 
integrity and faculty autonomy (DeVoto et al., 2020). 

In some states, most notably New York, policymakers have elected to require 
implementation and high stakes use of a standardized TPA quite rapidly (Clayton, 
2018a; Ledwell & Oyler, 2016; Reagan et al., 2016). In a comparative review of edTPA 
implementation reports from six early adopter states, Reagan et al. (2016) concluded: 
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rapid implementation of the edTPA in New York may have resulted in pushback and 
adjustments to the full-scale consequential implementation of the edTPA... 


and that 


...the timeline for the edTPA in New York may have limited meaningful dialogue 
among the multiple actors involved in the implementation of the assessment. (Reagan 
et al., 2016, p. 16) 


We conclude that while a slower pace of implementation does not necessarily avoid 
stress and confusion, it is likely to allow better communication and more effective plan- 
ning and problem solving between state policymakers, teacher educators, and P-12 
partners. This, in turn, can lead to more meaningful use of TPAs as tools for achieving 
program improvement goals that matter to program faculty, staff, and candidates. 


Inter-Organizational Collaboration and Support 


Several studies have highlighted the value of inter-organizational collaboration as 
a support for TPA implementation (Olson & Rao, 2017; Peck et al., 2012; Warner et al., 
2020). In some cases, collaborative relationships between teacher education and state 
policy agencies have been reported to facilitate program-to-program communication 
and problem solving as implementation challenges have arisen. One example of this 
kind of collaboration between state policymakers and teacher educators was reported 
by Meyer et al. (2018), who described the development and implementation of a 
standardized, high stakes TPA in the State of Kansas. The authors noted that strong 
state administrative support for the assessment, coupled with local program policy of 
incorporating scoring costs into student tuition, allowed the TPA to be implemented 
at a relatively low direct cost to students. 

In another state-level study of TPA implementation, Peck et al. (2012) described 
the collaborative effort between the American Association of Colleges for Teacher 
Education (AACTE)-affiliated network of teacher education programs and state policy 
administrators in Washington State during the pilot phases of edTPA implementation. 
Based on follow-up interviews with 26 program administrators, faculty, and field super- 
visors as well as documents collected from 8 programs participating in a field test of 
the edTPA, Peck et al. concluded that 


it appears to us that the most potent resource available for supporting positive imple- 
mentation outcomes may be the collaborative relationship that has been forged between 
the PESB (the state Professional Education Standards Board), edTPA (SCALE) and 
WAACTE programs. (p. 22) 


The value of building state-level supports for inter-organizational collaboration 
related to TPA implementation was also underscored in an account of edTPA imple- 
mentation in the State of Illinois (Olson & Rao, 2017). Using an analysis of state 
and program-level documents describing edTPA implementation efforts in the state 
between 2010 and 2017, Olson and Rao describe a formal collaboration between state 
policymakers, SCALE, and the state network of teacher education program affiliated 
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with AACTE: the Illinois Teaching Performance Assessment Consortium. Olson and 
Rao found that this inter-organizational structure functioned as an important sup- 
port for both programs and policymakers as they navigated the practical challenges 
of edTPA implementation, including development of a shared understanding of the 
TPA, and cross-program sharing of specific strategies and resources for faculty and 
candidate support. 


Program-Level Factors Affecting TPA Implementation 


Local program characteristics, policies, and practices clearly affect TPA implementa- 
tion (De Voto et al., 2020; Ledwell & Oyler, 2016). These include the values, beliefs, and 
motivations of the people undertaking the work of implementation, the ways in which 
the TPA itself is scored and used, and the organizational supports available to candi- 
dates, faculty, and program staff related to the work of implementation. We consider 
each of these program characteristics separately, and then discuss the critical function 
of program leadership in shaping how they interact with one another in making TPAs 
function as useful resources for program improvement. 


People: Local Values, Perceptions, and Interpretations of the TPA 


Research from the P-12 sector has shown that teachers’ responses to reform policy 
initiatives are deeply rooted in interactions between teachers’ prior values, beliefs, and 
perceptions about the purposes of their work and their perceptions of the purposes 
underlying the proposed reform (Coburn et al., 2009; McLaughlin, 1987; Spillane, 
2000). Similarly, several studies of TPA implementation make it clear that what has 
been referred to as the “stance” of program faculty and staff can affect TPA implemen- 
tation, including the quality and amount of support provided to teacher candidates in 
preparing for the assessment (Cohen et al., 2020; De Voto et al., 2020; Ledwell & Oyler, 
2016; Ratner & Kolman, 2016). For example, in a study of faculty working in teacher 
education programs situated in public colleges of education in New York City, Ratner 
and Kolman (2016) collected open-ended survey responses from nine faculty members 
and conducted follow-up interviews with three of these faculty. As the authors note, 
this study was conducted in the context of the unusually rapid implementation of a 
high stakes TPA mandate in the state of New York. They found that faculty responses 
to the TPA initiative were quite varied, ranging from wholesale rejection of the purpose 
and legitimacy of the TPA to cautious support. An important finding, however, was 
that while all faculty participants in this study expressed commitment to their students’ 
learning, differences in faculty stance toward the TPA were associated with substantive 
differences in both the extent and types of practices with which they engaged the work 
of preparing candidates to pass the TPA. 

The Ratner and Koman (2016) study is quite limited in terms of both its participant 
sample and methodology (as the authors acknowledge). However, their findings are 
consistent with those from additional studies of the relationship between faculty stance 
and TPA implementation (Cohen et al., 2020; De Voto et al., 2020; Ledwell & Oyler, 
2016). De Voto et al. (2020) used data from interviews, focus groups, and documents 
to investigate the ways that 69 faculty, staff, and teacher candidates from 8 teacher 
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education programs in 2 states interpreted and implemented the edTPA. They found 
that when faculty, staff, and teacher candidates interpreted the TPA to be in align- 
ment with local program values and practices, their commitment to implementing the 
assessment and using it for program improvement was relatively high. Conversely, in 
programs where the TPA was viewed to conflict with local values, both the will and 
capacity of programs to implement the assessment was reduced. De Voto et al. (2020) 
noted that the relationships between faculty stance and TPA implementation were not 
uniform and appeared in some cases to be mediated by additional variables, including 
the quality of local program leadership, and availability and allocation of resources to 
support implementation. 

Cohen et al. (2020) report additional evidence about the relationship between the 
interpretive stance of program members and the supports provided to candidates in 
preparing for the edTPA. Using survey and interview data collected from faculty, field 
supervisors, and candidates across multiple programs situated in a research-intensive 
university, Cohen et al. (2020) found that candidates were quite sensitive to differences 
in faculty stance toward the edTPA. Candidates viewed these differences in faculty 
stance to be related to differences in the supports they received in preparing for the 
assessment. Program members’ stance about the edTPA was also found to vary by 
role, with tenure line faculty reporting more negative views about the assessment than 
field supervisors, being less likely to participate in professional development activities 
aimed at developing a shared understanding and commitment to implementing the 
assessment, and less likely to recognize the affordances of the edTPA as a resource for 
improvement of practice. 


Tools: How TPA Data Are Scored, Disaggregated, and Organized 


Several recent studies suggest that how TPA data are analyzed and presented can 
affect the value of the data as a resource for learning and improvement. 


Local scoring. One issue that has received considerable attention has to do with local 
versus external scoring of TPA portfolios. Accounts of TPA implementation consistently 
describe tensions between the value of local scoring of TPAs by program faculty and 
staff, and the value of having portfolios scored by individuals with no direct ties to 
local programs (Sloan, 2013; Warner et al., 2020). For example, Sloan (2013) described 
the important opportunities for learning and collaboration that attended local scoring, 
noting that faculty and staff, in learning to score the TPAs consistently, were forced to 
recognize and engage differences in their understanding and evaluation of teaching 
that had been previously obscured by “silos” of practice within the program. Sloan also 
reported that joint examination of candidate portfolios allowed all program members to 
see what candidates had (and had not) taken up from their coursework and integrated 
into their classroom practice. Sloan reported that the shared understanding of program 
outcomes that was afforded through the local scoring process became a resource for 
negotiating joint focus and commitment to actions related to program improvement. 
A similar process of faculty learning and engagement with change related to local 
scoring of TPAs was reported by Cuthrell et al. (2019). In this program, local scoring 
was gradually displaced by external scoring (via Pearson); however, the value of local 
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scoring was considered significant enough in this program that 20 percent of TPAs were 
retained for local co-scoring. 

An important question about local scoring is whether results are affected by the 
positionality of local scorers as faculty and staff of the program in which candidates are 
enrolled. Pursuing this question, Bastian et al. (2016) compared edTPAs that were scored 
both locally and externally (via Pearson) in one large university program. Results of 
these comparisons for 64 teacher candidates showed that although edTPA scores were 
consistently higher when scored locally, local scores were nevertheless predictive of 
teacher evaluation ratings collected during candidates’ first year of teaching. Bastian 
et al. (2020) concluded that while the inconsistency of local and external scoring results 
raises questions about using local scores for high stakes decisions about licensure, local 
TPA scoring may be valuable for the purposes of improving faculty and staff under- 
standing of program outcomes and in engaging them in collaborative efforts toward 
program improvement. 


Disaggregration of TPA data. Several reports of TPA implementation suggest that 
faculty learning and program improvement are affected by the ways in which TPA 
data are disaggregated for analysis and interpretation. Sloan (2013) and Whittaker and 
Nelson (2013) provide similar accounts of how faculty in two programs in California 
used specific examples of work from PACT portfolios as a context for collaborative, evi- 
dence-based conversation about what candidates had and had not taken up from their 
coursework and fieldwork experiences. Sloan (2013) commented that program mem- 
bers were particularly engaged in this collaborative work when “candidate work is on 
the table.” Whittaker and Nelson (2013) describe how PACT data evaluating candidates’ 
instruction of English learners were used to design a series of professional development 
activities for course instructors and field supervisors. These activities were followed by 
curriculum alignment and integration workshops in which program faculty collabo- 
rated to make major program changes aimed at improving candidates’ practice related 
to supporting the academic language development of their P-12 students. 

Bunch et al. (2009) also described an analysis of PACT portfolios focused on evalu- 
ation and improvement of candidate instructional practices with English learners. 
Using qualitative data analysis techniques, the authors found evidence that candidates 
used a variety of research-based curricula and instructional strategies in their teaching, 
including the use of multiple modes of representation in instruction, connecting to stu- 
dents’ experiences in their cultural communities, and supporting students’ use of their 
native language as a resource for learning. However, careful examination of candidate 
portfolio commentaries also showed the persistence of deficit orientations underlying 
some candidates’ interpretation of student learning, particularly when their students 
were struggling. Bunch et al. (2009) commented on the value of “going beyond the 
scores” in using PACT portfolios as formative assessments of candidate learning, and 
as resources for program self-assessment and improvement. 

A very different approach to disaggregation of TPA data was reported by Bastian 
et al. (2018). Using Latent Class Analysis (LCA), Bastian et al. identified four patterns 
of candidate instructional practice based on candidate scores on the individual rubrics 
of the edTPA. They comment on the potential value of LCA as a tool for identifying 
groups of teacher candidates that may need specific kinds of preservice supports, and 
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as a source of data for identifying priorities for induction year supports following 
licensure. The LCA methods described by Bastian et al. may be particularly valuable 
as a way of empirically identifying subsets of TPA portfolios to analyze in more depth 
as recommended by Bunch et al. (2009), Sloan (2013), and Whittaker and Nelson (2013). 


Organizational Policies and Practices 


The organizational conditions in which TPAs are implemented have significant 
effects on the extent to which opportunities for learning and improvement of practices 
are taken up by both candidates and program faculty. Several specific issues related to 
the support of TPA implementation have been identified in the literature. 


Time and space for collaboration. TPA implementation research strongly suggests that 
actualizing the affordances of TPAs as resources for learning and program improvement 
requires strategic allocation of time and space for faculty collaboration (Davis & Peck, 
2020; De Voto et al., 2020; Lys et al., 2014; Peck et al., 2010; Sloan et al., 2021). In some 
institutions, practical routines for supporting faculty collaboration are deeply institu- 
tionalized, and new kinds of TPA data are largely integrated into existing policies and 
practices (Pointer Mace & Luebke, 2021). More often, however, implementation stud- 
ies suggest that new routines and practices must be developed to support the kinds of 
collaboration that are required to make TPA data useful for program improvement. For 
example, Lys et al. describe regular “data summits” designed to create a time and place 
for faculty analysis and interpretation of TPA scores in the context of decisions about 
curriculum changes. Similarly, Sloan (2013) described week-long program retreats in 
which regular coursework and fieldwork activities were suspended to allow program 
faculty and staff to jointly score TPA portfolios and discuss their implications for pro- 
gram improvement actions. The importance of making time and space for faculty and 
staff collaboration is underscored by the De Voto et al. (2020) cross-program research, 
in which these practices were associated with the adoption of an “inquiry” oriented 
approach to TPA implementation as a resource for program improvement. 


Professional development. One of the challenges of TPA implementation is what Lys 
et al. (2014) refer to as faculty “readiness.” It is important to note that by “faculty” we 
refer here to university-based course instructors, including those in colleges of educa- 
tion and in arts and science departments, as well as field-based supervisors and mentor 
teachers—all of whom play essential roles in teacher preparation (Goodlad, 1990). 
Depth of teacher educators’ knowledge about a TPA is clearly critical to productive 
implementation, and opportunities to develop a shared and concrete understanding of 
the required artifacts, evaluation rubrics and procedures for scoring are among the fac- 
tors contributing to the uses of TPA data for program improvement (Sloan, 2013; Whit- 
taker & Nelson, 2013). Some implementation studies also suggest that opportunities to 
openly discuss concerns about both the tools and the policy contexts in which they are 
implemented are related to faculty stance toward the TPA, particularly in the context 
of external policy mandates (De Voto et al., 2020; Lys et al., 2016; Peck et al., 2010). 
Suleiman and Byrd (2016) describe and evaluate a model of edTPA-related pro- 
fessional development specifically related to the role of university field supervisors. 
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The model focused not only on each supervisor’s understanding of the edTPA in 
terms of purpose, design, and data analysis but also included a feedback process that 
field supervisors used to support teacher candidates throughout the edTPA process. 
Follow-up data showed that teacher candidates had a more positive attitude toward 
their supervisors and felt more prepared to complete the edTPA following this profes- 
sional development work. In a related study, Steadman and Dobson (2018) reported 
that a series of professional development meetings focused on implementation of the 
edTPA provided a new context for collaboration and joint planning for field supervi- 
sors that led to more consistency in their support for candidates and a stronger sense 
of shared identity as a “community of practice.” This finding is congruent with reports 
from faculty in the Peck et al. (2010) study, including the finding that for some faculty 
and supervisors who had previously felt marginalized in program discussions and 
decision-making, participating in shared professional development activities related 
to PACT implementation increased both their sense of belonging and their investment 
in the program. 


Leadership: Orchestrating TPA Implementation 


Evidence from both detailed case studies and larger scale cross-program investiga- 
tions of TPA implementation reflect the crucial role that leadership plays in managing 
the challenges and accessing the opportunities for learning and program improvement 
that attend TPA implementation (De Voto et al., 2020; Lys et al., 2014; Peck et al., 2010; 
Sloan, 2013). For example, De Voto et al. (De Voto & Thomas, 2020; De Voto et al., 2020) 
found that where academic leaders “set the tone” and “urged faculty and staff to look 
beyond the TPA mandate and focus on its framework as a tool for inquiry, program 
improvement, and redesign” (De Voto et al., 2020, p. 8), they were more likely to actively 
engage the edTPA as an opportunity for learning and program improvement. Con- 
versely, where leaders were less actively engaged in supporting sense-making around 
the edTPA, there was evidence of cosmetic compliance and active resistance (De Voto 
& Thomas, 2020; De Voto et al., 2020; Ledwell & Oyler, 2016). 

The case study reported by Peck et al. (2010) provides a more concrete example 
of how program leaders facilitate the kind of sense-making that can support faculty 
engagement with inquiry-oriented TPA implementation. In this program, academic 
leaders led a series of discussions in which program faculty reflected on their own 
values and goals for the program. Faculty then used their list of “valued outcomes” as 
an analytic framework for decomposing the requirements of the PACT process. These 
deliberations allowed program members to identify specific areas where the tool was 
aligned (and misaligned) with local values and identify opportunities to use the tool 
in ways that supported their own program goals. This general leadership strategy is 
also evident in other case study reports of TPA implementation (Lachuk & Koellner, 
2015; Torgerson et al., 2009). 

Sloan (2013) observed that leadership responsibilities in the TPA implementation 
process were not restricted to program members in formal administrative positions but 
were distributed across virtually all program members in ways that were related to their 
specific roles. For example, Sloan describes how field supervisors took on important 
leadership functions in creating new tools that assisted both candidates and mentor 
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teachers in developing a clearer understanding of the goals and requirements of PACT 
and ensuring that planning and communication were well supported as candidates 
completed the assessment process. The importance of distributing leadership roles and 
functions widely across program members was also evident in a retrospective analysis 
of 6 years of ed TPA implementation work completed at a large teacher preparation 
institution (Cuthrell et al., 2019). These authors emphasized the importance of carefully 
preparing and supporting program members in assuming leadership roles in the TPA 
implementation process. 


SUMMARY REFLECTIONS AND RECOMMENDATIONS 


Almost two decades ago, Shulman (2005) commented on the status of teacher edu- 
cation as a field in which “a thousand flowers” were allowed to bloom, a field perhaps 
charming in its diversity but virtually impossible to cultivate. The enormous body of 
work that has ensued since that time has moved the field a considerable distance toward 
establishing a professional consensus on standards articulating what we mean by 
“teaching quality” (NBPTS, InTASC), along with a variety of performance assessment 
tools aimed at evaluating and improving teaching practice based on those standards. 
However, the progress that has been made co-exists with ongoing philosophical and 
political disputation about the purposes of education, accompanied by related disputes 
regarding choice of theoretical tools and relevant empirical evidence (Cochran-Smith 
& Fries, 2001; Wilson & Youngs, 2005). 

To these ongoing challenges we add the observation of many scholars, and perhaps 
almost all practitioners, that teaching is at its core inescapably contextual in nature 
(Grossman et al., 2009). Good teaching is—perhaps to an uncomfortable extent for some 
policymakers—contingent on a teacher’s appraisal and response to the demands and 
affordances of specific situations, including the complex interplay of individual student 
characteristics and needs, the classroom as a social community with its own history and 
culture, state and district policy, and (to be sure) the repertoire of knowledge and skill 
of the teacher. Our review suggests that TPAs can be extremely useful resources for 
evaluating and improving teacher education programs. However, like other aspects of 
teaching, they are situated in complex systems of professional activity in which skilled 
and knowledgeable judgment is fundamental to making good decisions about how they 
are used. Moss (2013) comments on this in a particularly informative way: 


If the goal is to make decisions about how to improve teaching and learning or to make 
choices among alternative courses of action or policies, evidence of student outcomes 
alone is insufficient; one must consider information about the conceptual and material 
resources, the teaching processes and practices, and the organizational routines and 
cultures that shape or influence those outcomes. (p. 93) 


With this contextualist view in mind, we offer some pragmatic suggestions for 
TPA-related policy, practice, and research based on our review. We believe careful and 
strategic attention to these concerns is crucial to making TPAs function effectively as 
resources for meaningful program improvement. 
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State Policy 


Our review suggests several actions that might be undertaken at the state policy level 
to support implementation and use of TPAs for program evaluation and improvement. 


Increase Program Accountability for the Quality of TPA Implementation 


Like others, we recommend that TPAs be considered in conjunction with other mea- 
sures of teaching competence and performance and used within deliberative processes 
leading to licensure recommendations at the local program level (Darling-Hammond, 
2010; Moss, 2013; Pecheone & Chung, 2006; Reagan et al., 2016). However, we also sug- 
gest that this moderated approach to the uses of standardized TPAs be balanced with 
more strategic and effective accountability policies related to the quality of program- 
matic support for candidates in completing the TPAs, particularly if they are required 
for licensure. We are not suggesting accountability policies based on program rankings 
or statewide benchmarks related to TPA pass rates (Reagan et al., 2016). Rather, we are 
suggesting that programs be expected to measure, evaluate, and, as necessary, improve 
their supports for candidate preparation related to the performance expectations of 
any program- or state-level TPA, and that evidence of program-level commitments to 
continuous monitoring and improvement of TPA preparation practices be incorporated 
into accreditation reviews and public reports. 

This seems particularly important in the context of the considerable evidence sug- 
gesting that program supports for candidates related to preparation for TPAs vary 
widely (Cohen et al., 2020; De Voto et al., 2020) and sometimes reflect intentional faculty 
disengagement with efforts to prepare candidates to succeed with TPAs (Ledwell & 
Oyler, 2016; Ratner & Kolman, 2016). While faculty deflection of accountability policies, 
such as those underlying many contemporary TPA-related initiatives, may be grounded 
in a thoughtfully considered set of professional and political values (e.g., Kornfeld et 
al., 2007), it seems reasonable that candidates be apprised of this faculty stance before 
investing in a program that they expect to prepare them for licensure. 

A useful example of program-focused accountability policies related to TPA imple- 
mentation has been developed in California, in which clear expectations for practicum 
site selection, candidate support, and program-level analysis of TPA data for the pur- 
poses of program improvement are explicit.2 Additionally, Kim and Sato (2019) have 
developed a set of survey tools that are aligned with questions about the quality of 
TPA implementation that the research literature suggests are important to monitor and 
evaluate. Resources such as these could be useful tools in increasing program account- 
ability for the quality of TPA support candidates receive. 


Increase State-Level Supports for Cross-Program Learning and Improvement 


Several of the studies we reviewed suggested the value of state-level supports for 
cross-program collaboration, learning, and improvement. These “capacity-building” 
policies include creation and support of standing state-level collaborative committees 


3 See https://www.ctc.ca.gov /docs/default-source /educator-prep /standards/ prelimmsstandard-pdf.pdf?sfvrsn= 
a35b06c_4. 
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aimed at supporting TPA implementation (De Voto et al., 2020), state-sponsored TPA 
trainings (Warner et al., 2020), and state conferences sponsored in collaboration with 
AACTE-affiliated teacher education program networks (Peck et al., 2012). 

One of the important affordances of having a common and standardized TPA is 
the opportunity it represents for state or national level consortia to identify shared 
needs and undertake collaborative actions for program improvement. For example, 
SCALE has recently convened a group of more than 50 teacher education programs 
from around the United States that have been highly successful in preparing their 
candidates of color for the edTPA over the past 5 years (Pecheone, personal communi- 
cation, March 11, 2021). These programs are working collectively to identify common 
themes underlying their successes and develop a set of policy/practice briefs to share 
what they have learned with other programs. The opportunity to undertake large scale 
cross-program analyses of this type is predicated on having common measures and a 
common language that allows practitioners and researchers from diverse institutional 
and state policy contexts to learn from one another. For a field that has been character- 
ized by its struggles to build a shared and cumulative knowledge base (Sleeter, 2014; 
Wideen et al., 1998; Zeichner, 2007), this opportunity seems important. 


Program Practices 


In addition to state-level policies, our review suggests several program-level prac- 
tices that appear likely to support the implementation of a TPA and enhance its value 
as a resource for local learning and program improvement. 


Develop a Leadership Plan 


Strategic leadership is critical to making TPAs useful and used for program improve- 
ment. Leadership strategies should include plans for supporting program-level delib- 
eration and sense-making related to examining the TPA itself and considering the 
purposes and possibilities it may afford for achieving local program improvement. 
Leadership planning should also clarify strategies for distributing leadership oppor- 
tunities and related responsibilities broadly across the program (Sloan, 2013) and for 
balancing the voices of program members that are engaged in coursework and field- 
work (Peck et al., 2010; Steadman & Dobson, 2018). 


Invest in Professional Development 


A consistent finding from case studies of TPA implementation is the importance of 
professional development opportunities for faculty, staff, and P-12 colleagues. A variety 
of resources exist to support informational aspects of training related to the content and 
process of the assessments, particularly for widely used assessments such as the ed TPA 
(National Education Association, 2021). However, case reports also underscore the 
critical role that opportunities for collaborative discussion, deliberation, and problem- 
solving play not just in learning about the tools, but in building a shared vision and 
commitment to using and learning from the assessments (Cuthrell et al., 2019; Lachuk 
& Koellner, 2015; Sloan, 2013). 
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Some of the most powerful professional development experiences reported in these 
case studies refer to joint work activities undertaken by faculty, field supervisors, and 
mentor teachers. For example, several studies suggest the value of local program mem- 
bers scoring (at least a portion of) TPAs completed by their own candidates (Bastian 
et al., 2016; Cuthrell et al., 2019; Peck et al., 2010; Sloan, 2013). One important value of 
direct participation in scoring lies in the process of calibration, in which program fac- 
ulty, supervisors, and cooperating teachers can begin to build a common understanding 
of the affordances (and limitations) of a TPA and a common language for describing 
candidate teaching practice. 


Disaggregate TPA Data 


Aggregated TPA scores can be useful as a beginning point for identifying gen- 
eral areas of program strength and weakness. However, further decomposition and 
analysis of scores (Bastian et al., 2018) and/or systematic qualitative analysis of TPA 
artifacts is likely to be particularly useful for identifying program-level curriculum and 
instructional variables related to improvement goals (Bunch et al., 2009; Peck et al., 
2010; Sandoval et al., 2020; Sloan, 2013). Several studies suggest that opportunities for 
collaborative examination of concrete artifacts of teaching practice across a sample of 
TPA portfolios may ameliorate status differences between university faculty and field- 
based teacher educators, as field supervisors and mentor teachers are generally well- 
positioned to interpret assessment findings because of their more intimate knowledge 
of classroom conditions affecting candidate teaching (Cuthrell et al., 2018; Peck et al., 
2010; Sloan, 2013; Whittaker & Nelson, 2013). 


Create Specific Organizational Supports for TPA Work 


Making a TPA function effectively as a resource for program improvement requires 
strategic investment of time to support the work. Case studies of TPA implementa- 
tion suggest that integration of focused TPA implementation planning into regularly 
scheduled administrative and programmatic meetings as well as the creation of special 
events such as retreats or “Data Days” are both useful strategies. The success of either 
requires careful planning and preparation and is contingent on strong and proactive 
program leadership (Kroeger, 2019; Lys et al., 2014; Sloan et al., 2021). 


Research Priorities 


The research base related to TPAs is extremely fragmented. Psychometric studies 
evaluating the reliability and validity of these measures have been useful for identifying 
important questions about these measures and how they are used (Bastian et al., 2016; 
Gitomer et al., 2019; Goldhaber et al., 2017; Stanford Center for Assessment, Learning, 
and Equity, 2015). However, with notable exceptions (e.g., Cohen et al., 2020; De Voto 
et al., 2020; Kim & Sato, 2019), implementation studies have been limited in both scope 
and method. Based on this general appraisal of the status of TPA-related research, we 
suggest several priorities for inquiry. 
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TPAs and P-12 Student Achievement 


Investment in TPAs is predicated on the assumption that variability in teaching per- 
formance as measured by these tools is related to learning outcomes for P-12 students. 
The evidence on this point is inconclusive but promising (Bastian et al., 2016; Chen 
et al., 2021; Goldhaber et al., 2017; National Research Council, 2008; Stanford Center 
for Assessment, Learning, and Equity, 2015). Evidence related to the effects of teacher 
education program improvement interventions on the learning of P-12 students is even 
more limited. Gansle et al. (2015) report one well-documented example of the kind of 
research linking program improvement actions to P-12 student outcomes that would 
be helpful. While the notion of “using data for program improvement” represents an 
intuitively appealing and highly rational theory of action, the fact remains that the 
field is in serious need of research that clarifies the conditions under which faith in 
this theory is justified. 


Equity Issues 


Existing research has raised important questions about differences in TPA perfor- 
mance between historically marginalized racial and cultural groups and their White 
peers (e.g., Goldhaber et al., 2017). However, results of these studies have been incon- 
sistent. There is a clear and pressing need to replicate and extend research in this area. 
The troubling findings from Williams et al. (2019) suggest the importance of undertak- 
ing more process-oriented studies of program-level policies and practices that may 
contribute to TPA performance of minoritized teacher candidates. 

Several efforts to engage equity concerns about TPAs have also been undertaken 
through collaborations among teacher education practitioners. For example, as described 
above, SCALE has recently assembled a cadre of programs that have either consistently 
achieved high edTPA scores with candidates of color or substantial improvements 
on these scores over time (Pecheone, personal communication, March 11, 2021). The 
members of this group are currently working to identify innovations, as well as shared 
themes in practice, that may contribute to improvements in preparation of candidates 
of color to succeed with these assessments at the same levels as White peers. Other 
efforts to engage equity issues related to TPAs have focused on critical analysis of the 
instruments themselves, as well as improving procedures for analyzing the data they 
produce (Escalante et al., 2021; Stillman et al., 2013). Taken together, these reports sug- 
gest that equity-related concerns about TPAs, like other standardized assessments in 
education, have become a focal point for both critique and improvement of the both 
the instruments themselves and the ways they are used in evaluation of preservice 
teacher quality. 


Learning Processes 


Despite the rationale and promise of performance assessments as resources for 
candidate and program-level learning and improvement (Darling-Hammond & Snyder, 
2000; Haertel, 1991; Pecheone & Chung, 2006), we need to know much more about the 
conditions under which these learning processes take place. The contexts and condi- 
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tions under which teacher candidates learn and improve their practice is an important 
locus of this research (Chung, 2008; Lin, 2015). We also need to know more about 
faculty learning, including how faculty learning at the individual level is related to 
collective learning and program-level decision-making (Peck et al., 2009), and how 
these processes themselves may be mediated by professional networks of affiliation 
and influence (Cohen et al., 2020). 


Intervention Studies 


The research on TPA implementation is almost entirely descriptive and retrospective 
in nature. Some of these studies have produced rich descriptions of promising clinical 
practice (e.g., Alloway & Lesh, 2019) that merit further evaluation using more prospec- 
tive, intervention-oriented methodologies. The case-based intra-subject experimental 
designs used in clinical research fields such as psychology and special education may 
be particularly useful in evaluation of specific program practices (Perdices & Tate, 2009; 
Sidman, 1960). Intra-subject research methods support focused evaluation of specific 
changes in local practice, using pre-intervention measures in each individual case as a 
basis for comparison with post-intervention outcomes aimed at program improvement 
goals. While systematic analysis and improvement of local practice is the first goal of 
intra-subject research methodology, systematic replication of these kinds of studies, 
even with small numbers of participants in each, can provide a basis for developing 
and evaluating innovations in practice that are useful across programs. 

Formative intervention methodologies (Penuel, 2014) such as design-based imple- 
mentation research (Fishman et al., 2013), developmental work research (Sannino et al., 
2016), and improvement science (Bryk et al., 2015) are particularly useful for studying 
and improving processes of learning, negotiation, and organizational change at the 
program and institutional level. A distinguishing feature of these methods is their focus 
on collaborative processes of problem identification, data collection, and action related 
to change—collaborations that are essential to making TPA data matter for program 
improvement. Both intra-subject and formative intervention research methodologies 
share an additional feature, which is that their focus on evaluation of program improve- 
ment actions with small sample sizes allows these methods to be “locally owned and 
operated” in a way that is consonant with faculty values around local program author- 
ity and autonomy. 


CONCLUSION 


In an early review of the present paper, a colleague asked us “so ... do you think 
that TPAs are a good thing or a bad thing?” Our response to this provocative question is 
that TPAs are tools; like any tool, their value very much depends on how they are used 
and what they are used for. Our review suggests that the kinds of rich and contextual- 
ized data that TPAs produce can create valuable opportunities for learning, innovation, 
and program improvement (Denner et al., 2009; Lys et al., 2014; Sloan, 2013; Peck et al., 
2009). However, taking up these opportunities requires strategic organizational sup- 
ports for learning that are often weak or missing in both state policy and local program 
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routines and practice. Making TPAs more useful and more regularly used as resources 
for program evaluation, learning, and improvement will require not only the continu- 
ous improvement of the tools themselves, but more focused and strategic attention to 
improving the organizational conditions in which they are implemented. 
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